Merchants that operate network-accessible marketplaces may maintain electronic catalogs that include thousands of items (or more) offered for sale. These electronic catalogs often include item detail pages accessible through one or more networks (e.g., web pages accessible over the Internet). These item detail pages often include descriptive information (e.g., text descriptions, pictures or video) in order to assist a buyer in determining whether an item is worth purchasing. In many cases, this descriptive information may be based on information from manufacturers or suppliers of the items offered for sale. In some cases, different manufacturers and suppliers may provide the descriptive information to the merchant in different formats. For example, one supplier may list one type of identifier for an item as a part number whereas another supplier of that same item might list that identifier as a model number. In some cases, some suppliers may provide very detailed item information for an item whereas another supplier might provide very basic information. For instance, one supplier might include a text description including comprehensive marketing literature whereas another supplier might omit such description and only include basic information, such as a part or model number. Due at least in part to these types of variations in item information received from different suppliers of the same item, identifying duplicate item information (e.g., two sets of item information that may be different but nevertheless describe the same item) may not be an insignificant task.
While the system and method for genetic creation of a rule set for duplicate detection is described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that the system and method for genetic creation of a rule set for duplicate detection is not limited to the embodiments or drawings described. It should be understood, that the drawings and detailed description thereto are not intended to limit the system and method for genetic creation of a rule set for duplicate detection to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the system and method for genetic creation of a rule set for duplicate detection as defined by the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include,” “including,” and “includes” mean including, but not limited to.